AITopics

Country: Asia > China (0.28)

Genre: Research Report (0.68)

Industry: Transportation (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Neural Information Processing SystemsFeb-15-2026, 22:13:24 GMT

9213010cbcd6ba8e1f1cf1533835d51c-Paper-Conference.pdf

machine learning, natural language, reinforcement learning, (14 more...)

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.68)

Industry: Transportation (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Neural Information Processing SystemsFeb-11-2026, 09:07:23 GMT

af9c9c6d2da701da5a0acf91ec217815-Paper-Datasets_and_Benchmarks.pdf

dataset, human pose estimation, pose estimation, (12 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.05)
North America > United States > Texas (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.43)

Neural Information Processing SystemsDec-24-2025, 07:50:39 GMT

Compressed Video Contrastive Learning

This work concerns self-supervised video representation learning (SSVRL), one topic that has received much attention recently. Since videos are storage-intensive and contain a rich source of visual content, models designed for SSVRL are expected to be storage-and computation-efficient, as well as effective. However, most existing methods only focus on one of the two objectives, failing to consider both at the same time. In this work, for the first time, the seemingly contradictory goals are simultaneously achieved by exploiting compressed videos and capturing mutual information between two input streams. Specifically, a novel Motion Vector based Cross Guidance Contrastive learning approach (MVCGC) is proposed. For storage and computation efficiency, we choose to directly decode RGB frames and motion vectors (that resemble low-resolution optical flows) from compressed videos on-the-fly. To enhance the representation ability of the motion vectors, hence the effectiveness of our method, we design a cross guidance contrastive learning algorithm based on multi-instance InfoNCE loss, where motion vectors can take supervision signals from RGB frames and vice versa. Comprehensive experiments on two downstream tasks show that our MVCGC yields new state-of-the-art while being significantly more efficient than its competitors.

compressed video contrastive learning, motion vector, name change, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Jawaid, Mohsi, Märtens, Marcus, Chin, Tat-Jun

Event-RGB Fusion for Spacecraft Pose Estimation Under Harsh Lighting

arXiv.org Artificial IntelligenceNov-4-2025

Spacecraft pose estimation is crucial for autonomous in-space operations, such as rendezvous, docking and on-orbit servicing. Vision-based pose estimation methods, which typically employ RGB imaging sensors, is a compelling solution for spacecraft pose estimation, but are challenged by harsh lighting conditions, which produce imaging artifacts such as glare, over-exposure, blooming and lens flare. Due to their much higher dynamic range, neuromorphic or event sensors are more resilient to extreme lighting conditions. However, event sensors generally have lower spatial resolution and suffer from reduced signal-to-noise ratio during periods of low relative motion. A beam-splitter prism was employed to achieve precise optical and temporal alignment. Then, a RANSAC-based technique was developed to fuse the information from the RGB and event channels to achieve pose estimation that leveraged the strengths of the two modalities. The pipeline was complemented by dropout uncertainty estimation to detect extreme conditions that affect either channel. To benchmark the performance of the proposed event-RGB fusion method, we collected a comprehensive real dataset of RGB and event data for satellite pose estimation in a laboratory setting under a variety of challenging illumination conditions. Encouraging results on the dataset demonstrate the efficacy of our event-RGB fusion approach and further supports the usage of event sensors for spacecraft pose estimation. To support community research on this topic, our dataset has been released publicly. Keywords: event-based pose estimation, rendezvous, domain gap, sensor fusion, close proximity, harsh lighting1. Introduction Spacecraft pose estimation is the problem of determining the 6-degrees-of-freedom (6DoF) pose consisting of the position and orientation of a space-borne object, typically a satellite. It is a critical step in a wide range of space applications, including rendezvous, close proximity operations, debris removal, refueling and on-orbit servicing [1, 2, 3, 4]. Robust pose estimation is paramount to safely and effectively executing these tasks [5, 6]. Several types of sensor technologies can be employed for spacecraft pose estimation, but they are all subject to size-weight-power and cost (SWaP-C) constraints. Optical sensors such as RGB imaging sensors are favored due to their low SWaP-C requirements, high resolution and the availability of established vision-based algorithms. However, operating in the space environment can present nontrivial challenges to vision-based systems.

artificial intelligence, machine learning, pose estimation, (20 more...)

doi: 10.1016/j.ast.2025.111039

2507.05698

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.45)

Industry: Energy > Renewable (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Venkatesh, Danush Kumar, Schmidt, Adam, Jamal, Muhammad Abdullah, Mohareri, Omid

Mitigating Surgical Data Imbalance with Dual-Prediction Video Diffusion Model

arXiv.org Artificial IntelligenceOct-10-2025

Surgical video datasets are essential for scene understanding, enabling procedural modeling and intra-operative support. However, these datasets are often heavily imbalanced, with rare actions and tools under-represented, which limits the robustness of downstream models. We address this challenge with $SurgiFlowVid$, a sparse and controllable video diffusion framework for generating surgical videos of under-represented classes. Our approach introduces a dual-prediction diffusion module that jointly denoises RGB frames and optical flow, providing temporal inductive biases to improve motion modeling from limited samples. In addition, a sparse visual encoder conditions the generation process on lightweight signals (e.g., sparse segmentation masks or RGB frames), enabling controllability without dense annotations. We validate our approach on three surgical datasets across tasks including action recognition, tool presence detection, and laparoscope motion prediction. Synthetic data generated by our method yields consistent gains of 10-20% over competitive baselines, establishing $SurgiFlowVid$ as a promising strategy to mitigate data imbalance and advance surgical video understanding methods.

large language model, machine learning, natural language, (20 more...)

2510.07345

Country: Europe (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

arXiv.org Artificial IntelligenceOct-6-2025

Representation Learning for Compressed Video Action Recognition via Attentive Cross-modal Interaction with Motion Enhancement

Li, Bing, Chen, Jiaxin, Zhang, Dongming, Bao, Xiuguo, Huang, Di

Compressed video action recognition has recently drawn growing attention, since it remarkably reduces the storage and computational cost via replacing raw videos by sparsely sampled RGB frames and compressed motion cues ( e.g., motion vectors and residuals). However, this task severely suffers from the coarse and noisy dynamics and the insufficient fusion of the heterogeneous RGB and motion modalities. To address the two issues above, this paper proposes a novel framework, namely Attentive Cross-modal Interaction Network with Motion Enhancement (MEACI-Net). It follows the two-stream architecture, i.e. one for the RGB modality and the other for the motion modality. Particularly, the motion stream employs a multi-scale block embedded with a denoising module to enhance representation learning. The interaction between the two streams is then strengthened by introducing the Selective Motion Complement (SMC) and Cross-Modality Augment (CMA) modules, where SMC complements the RGB modality with spatio-temporally attentive local motion features and CMA further combines the two modalities with selective feature augmentation. Extensive experiments on the UCF-101, HMDB-51 and Kinetics-400 benchmarks demonstrate the effectiveness and efficiency of MEACI-Net.

artificial intelligence, deep learning, machine learning, (17 more...)

2205.03569

Country: Asia > China (0.29)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Vinod, Krishna, Ramesh, Prithvi Jai, N, Pavan Kumar B, Chakravarthi, Bharatesh

SEBVS: Synthetic Event-based Visual Servoing for Robot Navigation and Manipulation

arXiv.org Artificial IntelligenceAug-26-2025

Event cameras offer microsecond latency, high dynamic range, and low power consumption, making them ideal for real-time robotic perception under challenging conditions such as motion blur, occlusion, and illumination changes. However, despite their advantages, synthetic event-based vision remains largely unexplored in mainstream robotics simulators. This lack of simulation setup hinders the evaluation of event-driven approaches for robotic manipulation and navigation tasks. This work presents an open-source, user-friendly v2e robotics operating system (ROS) package for Gazebo simulation that enables seamless event stream generation from RGB camera feeds. The package is used to investigate event-based robotic policies ( ERP) for real-time navigation and manipulation. Two representative scenarios are evaluated: ( 1) object following with a mobile robot and ( 2) object detection and grasping with a robotic manipulator. Transformer-based ERP s are trained by behavior cloning and compared to RGB-based counterparts under various operating conditions. Experimental results show that event-guided policies consistently deliver competitive advantages. The results highlight the potential of event-driven perception to improve real-time robotic navigation and manipulation, providing a foundation for broader integration of event cameras into robotic policy learning.

artificial intelligence, machine learning, manipulation, (19 more...)

2508.17643

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Neural Information Processing SystemsAug-17-2025, 18:07:07 GMT

mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, R GB-D, and Inertial Sensors

We perform extensive experiments using our dataset and delineate the strength of each modality.

artificial intelligence, machine learning, pose estimation, (15 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.05)
North America > United States > Texas (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.43)

Neural Information Processing SystemsAug-15-2025, 06:27:56 GMT

Compressed Video Contrastive Learning

Existing state-of-the-art methods [Han et al. , 2020b; Tao et al. , 2020; Huo et al. , 2021] mainly focus More details can be found in Table 1. This clearly hinders large-scale video self-supervised training.

motion vector, mvcgc, representation, (16 more...)

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)